Massive disambiguation of large text corpora with flexible categorial grammar

نویسندگان

  • Ton van der Wouden
  • Dirk Heylen
چکیده

A n6~ ~/~ of au~mtic l~u~/cal disa~i~ation of big t ~ is d~ri~, us~u~ recent p~ft/u~tical ~s~l~ f~ %/~ th~zyf of cat~rial ~an~.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards High Speed Grammar Induction on Large Text Corpora

In this paper we describe an e cient and scalable implementation for grammar induction based on the EMILE approach ([2], [3],[4], [5], [6]). The current EMILE 4.1 implementation ([11]) is one of the rst e cient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora...

متن کامل

Translating Treebank Annotation For Evaluation

In this paper we discuss the need for corpora with a variety of annotations to provide suitable resources to evaluate different Natural Language Processing systems and to compare them. A supervised machine learning technique is presented for translating corpora between syntactic formalisms and is applied to the task of translating the Penn Treebank annotation into a Categorial Grammar annotatio...

متن کامل

Vowel Sound Disambiguation for Intelligible Korean Speech Synthesis

For speech synthesis systems that transform text materials into voice data, correctness and naturalness are the crucial measures of performance, the latter gaining more emphasis recently. In order to make synthesized voices natural, we must take into account pronunciation-related linguistic phenomena such as homograph, among others. The syntax certainly provides an important clue to disambiguat...

متن کامل

Acquisition of Large Scale Categorial Grammar Lexicons

A system is presented for inducing Categorial Grammar (CG) lexicons for natural language from either unannotated or minimally annotated corpora extracted from the Penn Treebank. A combination of symbolic and stochastic methods have been used to build a computationally e ective and psychologically plausible system, which learns linguistically useful lexicons. There are a variety of parameters in...

متن کامل

CCG Syntactic Reordering Models for Phrase-based Machine Translation

Statistical phrase-based machine translation requires no linguistic information beyond word-aligned parallel corpora (Zens et al., 2002; Koehn et al., 2003). Unfortunately, this linguistic agnosticism often produces ungrammatical translations. Syntax, or sentence structure, could provide guidance to phrasebased systems, but the “non-constituent” word strings that phrase-based decoders manipulat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988